home *** CD-ROM | disk | FTP | other *** search
- From: wjh+@andrew.cmu.edu (Fred Hansen)
-
-
- Recently I offered to post a summary of the algorithms used in the
- American Go Association rating system. The summary below was written by
- Paul Matthews, author of the rating system. It comes as part of the
- software for the "Accelerated Pairing System" which is a practical and
- equitable system for pairing players in tournaments.
-
- - - - - - - - - - - - - - -
- INSIDE THE AGA RATINGS SYSTEM 7/28/90
- Paul Matthews, Princeton Go Society
-
-
- INTRODUCTION
-
- Questions about ranks and ratings, who's really stronger, and how one
- part of the world compares with another, probably have no
- once-and-for-all-time answers. Local, national and international
- traditions evolve, players enter and leave active competition, the
- general level of go knowledge increases, and new champions appear. Yet
- there is a persistent interest in having some kind of measurement and
- recognition of playing strength. The AGA approach for many years has
- been to publish ratings, numbers on a continuous scale that can be
- equated roughly to traditional amateur ranks, but that reflect the ups
- and downs of competitive play.
-
- In 1988 and 1989, the AGA ratings system was extensively overhauled.
- Phil Straus, Paul Matthews, Bob High, Steve Fawthrop, Laurie Sweeney,
- Richard Cann, Bruce Ladendorf, Nick Patterson, and others, contributed
- mightily of their time and expertise to launch the new system. Although
- the initial goal was to correct logical inconsistencies that had crept
- into the old system, the bulk of the work turned out to be concerned
- with data integrity, tournament reporting practices, computer software
- development, and proving to each other that the new system really
- worked. The present article takes an inside look at the new system.
-
-
- NUMERICAL SCALE
-
- Ratings are expressed on a scale 100 and up for dan level players, and
- -100 and down for kyu level players. Dividing a rating by 100 yields
- the rank equivalent; thus, 276 is a 2 dan rating, and -432 is 4 kyu.
- Because there is no rank between 1 kyu and 1 dan, there are no ratings
- between -100 and 100, which can be confusing when doing ratings
- arithmetic.
-
- When a player first enters the system, his or her self-declared rank is
- translated to a provisional rating. For example, 6 dan is translated to
- 650, and 1 kyu to -149. Ratings adjust quickly, so that a new player
- reaches the right level in just a few tournaments, and no player's
- rating gets stuck; this is one of the improvements over the old system.
-
-
- CREDIBILITY
-
- Your AGA rating does not tell you precisely how strong you are. What it
- does tell you is how you stand relative to other players based on your
- recent performance in tournaments and other rated events. Your
- perception of your strength is based on more games than are rated, and
- you may be more accurate, particularly if you have been playing at about
- the same level for several years. However, if your estimate differs
- radically from your AGA rating, say higher by as much as 200 points,
- then most players would agree that you have something to prove, and be
- quite willing to give you the chance! Discrepancies of up to 100 points
- are within the range of statistical error, but if your rating were
- chronically 100 points below your claimed rank, then you ought to
- reassess the strength of your play.
-
- Be aware that many of your opponents may exaggerate their rank. In
- tournaments, players often enter at a higher rank to gain experience.
- But the ratings system sees them as they are, and consequently, your
- victories may not gain as many rating points as you think they should,
- and your losses may be more serious. In the United States, about one
- third of the players who claim ranks between 6 kyu and 3 dan have
- ratings that are one or more ranks lower. However, the ratings of
- players below 6 kyu and above 3 dan agree remarkably well with their
- claimed ranks.
-
-
- STATISTICAL MODEL
-
- A statistical model is indispensable to avoid logical inconsistencies
- and to do ratings arithmetic properly. In common with the Elo system
- used internationally in chess, the AGA model expresses the probability
- of winning a game as a function of rating difference. This so called
- "percentage expectancy" curve, PX, is represented as a normal
- probability distribution function with standard deviation px_sigma.
- Working backward from this assumption, it is possible to infer likely
- rating differences given actual game results.
-
- One problem this approach must address is to estimate a rating
- difference based on a single game, or any set of games where one player
- always wins. The mathematics of simple maximum likelihood estimation
- would suggest that the winning player is likely to be infinitely
- stronger than the loser! Given that most games are approximately evenly
- matched, this inference is obviously unreasonable, and ignores the fact
- that we have some prior knowledge about the players. The AGA system
- uses Bayesian statistical methods to solve the problem. The essential
- idea is to capture the notion that players are probably about the
- strength they say they are; the technical device is a normal probability
- density function, called the "rating prior," RP, centered on the
- player's presumed rating and with standard deviation rp_sigma. For one
- game, the Bayesian likelihood is of the form,
-
- likelihood(outcome) = RP(rating1) * RP(rating2)
- * PX(outcome | rating1 - rating2)
-
- At some point, the increase in PX likelihood as the estimated ratings of
- the two players spread apart is balanced by decreases in player RP
- likelihoods as ratings are stretched farther from the players' prior
- presumed strengths; new ratings are defined by the balance point where
- likelihood is at a maximum. The magnitude of the rating change is
- determined by rp_sigma, larger values allowing larger movements.
-
- For multiple games, the RPs for all the players, and the PXs for all the
- games, are multiplied together to obtain the overall likelihood. This
- connects the ratings of all players together in a network of
- interlocking games, and improves the stability and accuracy of ratings
- compared with updating ratings one game at a time. The maximum Bayesian
- likelihood is found numerically by simultaneously adjusting all the
- ratings until the best (i.e., most likely) combination is found.
-
-
- PARAMETER VALUES
-
- The current values of the AGA ratings system parameters are shown in the
- table below. A px_sigma value of 104 implies that a player who is
- stronger by a full rank (i.e., 100 rating points) should win about 83%
- of the time; the percentage for two ranks is 97%. The value of px_sigma
- was chosen, based on the analysis of thousands of games, to be
- consistent with the model that the rating point equivalent of an n stone
- handicap is 100n.
-
-
- RATINGS SYSTEM PARAMETER VALUES
-
- px_sigma = 104
- rp_sigma = 80
-
- Rating point equivalents of handicaps:
- 50 - 10 * komi, if stones = 0
- 100 * stones - 10 * komi if 2 <= stones <= 9
- where -20 <= komi <= 20
-
- Rp_sigma expresses the uncertainty associated with old ratings; in
- practice, rp_sigma controls the volatility of ratings. The current
- default value of 80 was chosen so that the average rating point value of
- a single game is 30, which limits the expected maximum gain in a five
- round tournament to 150 rating points. Simulations showed that both
- large and very small values of rp_sigma work poorly, leading to severe
- fluctuations or stagnant ratings respectively.
-
- The rating point equivalent of no komi, the so called "one stone"
- handicap, is significantly less than 100, a fact that was also
- recognized in the old ratings system. The rating point values of other
- komi handicaps is an interesting topic for future statistical
- investigation. The data that is currently available, much of it
- provided by Wayne Nelson, suggests that every point of a komi
- compensates for about 10 rating points. Thus, since the value of the
- first move (i.e., taking Black) is about 50 rating points, a reverse
- komi of 5 1/2 points should come close to compensating for a full rank
- difference.
-
-
- IMPROVING PLAYERS
-
- Many players believe that they are growing stronger, and are annoyed if
- their rating lags behind their self assessment. The default value of
- rp_sigma seems sufficient for routine rating adjustments; however, a
- rapidly improving player may play at a rank several hundred points above
- his or her old rating, and a boost is needed. Players who declare a
- rank more than 50 points higher than their rating, have the mean and
- standard deviation parameters of their RP function increased. By adding
- points to the RP mean, points are added to the whole system, helping to
- counteract the tendency for the ratings of stable players to deflate as
- other players improve. The larger standard deviation allows an
- improving player's rating to float more freely, upward or downward, and
- have less effect on the ratings of opponents. Note that a player who
- performs poorly when playing above his or her rating risks a larger loss
- of rating points.
-
-
- SOFTWARE
-
- The AGA ratings system is a suite of programs implemented (in C) for IBM
- PC compatible machines running DOS. The ratings system software has
- been extended to provide on-site support for a wide variety of handicap
- and championship tournaments, both small and large. Now tournament
- directors can generate on-the-spot ratings based on entry ranks and
- tournament games, and can even use the ratings to do pairings and figure
- out the tournament winners! These extensions are called the
- "Accelerated System." Significant effort also is being devoted to
- software that supports the verification and correction of AGA ID#s and
- names, preferrably at the tournament site.
-
-
- FUTURE WORK
-
- The revitalized AGA ratings system is a world class system that is a
- credit to the AGA and the go world. But it will never be perfect, and
- work continues. Phil Straus, the AGA Ratings Commission chairperson, is
- doing a super job in coordinating and motivating many activities
- relating to ratings. Some of the areas that are currently being
- addressed are: a comparison of ranks in foreign countries with AGA
- ratings; rating the games of professional players; and better tournament
- practices to improve data integrity.
-
-